Parsing spontaneous speech
نویسنده
چکیده
In this paper we will present work carried out lately on the 50,000 words Italian Spontaneous Speech Corpus called AVIP, under national project API, made available for free download from the website of the coordinator, the University of Naples. We will concentrate on the tuning of the parser for Italian which had been previously used to parse 100,000 words corpus of written Italian within the National Treebank initiative coordinated by ILC in Pisa. The parser receives as input the adequately transformed orthographic transcription of the dialogues making up the corpus, in which pauses, hesitations and other disfluencies have been turned into most likely corresponding punctiation marks, interjections or truncation of the word underlying the uttered segment. The most interesting phenomenon we will discuss is without any doubts "overlapping", i.e. a speech event in which two people speak at the same time by uttering actual words or in some cases nonwords, when one of the speakers, usually the one which is not the current turntaker, interrupts the current speaker. This phenomenon takes place at a certain point in time where it has to be anchored to the speech signal but in order to be fully parsed and subsequently semantically interpreted, it needs to be referred semantically to a following turn.
منابع مشابه
An improved joint model: POS tagging and dependency parsing
Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...
متن کاملAdapting dependency parsing to spontaneous speech for open domain spoken language understanding
Parsing human-human conversations consists in automatically enriching text transcription with semantic structure information. We use in this paper a FrameNet-based approach to semantics that, without needing a full semantic parse of a message, goes further than a simple flat translation of a message into basic concepts. FrameNet-based semantic parsing may follow a syntactic parsing step, howeve...
متن کاملSpoken Language Understanding in Spoken Dialog System for Travel Information Accessing
This paper briefly introduces the VOTIRS 2.0 system — a Chinese spoken dialog system for travel information accessing. Through this system, users can query the travel information about 52 routes and finally make a transaction for a proper travel plan. The strategy and method to understand spontaneous speech in the system are discussed in detail. To understand spontaneous speech, a Semantic Cons...
متن کاملParsing spoken language without syntax
Parsing spontaneous speech is a difficult task because of the ungrammatical nature of most spoken utterances. To overpass this problem, we propose in this paper to handle the spoken language without considering syntax. We describe thus a microsemantic parser which is uniquely based on an associative network of semantic priming. Experimental results on spontaneous speech show that this parser st...
متن کاملParsing Spoken Language without Syntax : a Microsemantic Approach
Parsing spontaneous speech is a difficult task because of the ungrammatical nature of most spoken utterances. To overpass this problem, we propose in this paper to handle the spoken language without considering syntax. We describe thus a microsemantic parser which is uniquely based on an associative network of semantic priming. Experimental results on spontaneous speech show that this parser st...
متن کاملStudy on Detection of Prosodic Phrase Boundaries in Spontaneous Speech
Prosodic information, which has the abilities of disambiguation, improving the parsing of the spoken language and predicting recognition errors, becomes more and more important in speech recognition and understanding, especially in spontaneous speech. In this paper, we investigate the detection of the phrase boundaries by prosodic features in the domain-specified Chinese spontaneous speech. The...
متن کامل